Goto

Collaborating Authors

 standardized test


Bottom-Up and Top-Down Analysis of Values, Agendas, and Observations in Corpora and LLMs

arXiv.org Artificial Intelligence

Large language models (LLMs) generate diverse, situated, persuasive texts from a plurality of potential perspectives, influenced heavily by their prompts and training data. As part of LLM adoption, we seek to characterize - and ideally, manage - the socio-cultural values that they express, for reasons of safety, accuracy, inclusion, and cultural fidelity. We present a validated approach to automatically (1) extracting heterogeneous latent value propositions from texts, (2) assessing resonance and conflict of values with texts, and (3) combining these operations to characterize the pluralistic value alignment of human-sourced and LLM-sourced textual data.


The End of Scantron Tests

The Atlantic - Technology

Through funding cuts and bumps, integration and resegregation, panics and reforms, world wars and culture wars, American students have consistently learned at least one thing well: how to whip out a No. 2 pencil and mark exam answers on a sheet printed with row after row of bubbles. Whether you are an iPad baby or a Baby Boomer, odds are that you have filled in at least a few, if not a few hundred, of these machine-graded multiple-choice forms. They have long been the key ingredient in an alphabet soup of standardized tests, both national (SAT, ACT, TOEFL, LSAT, GRE) and local (SHSAT, STAAR, WVGSA). And they are used in both $50,000-a-year academies and the most impoverished public schools, where the classic green or blue Scantron answer sheets can accompany daily quizzes in every subject. Machine grading, now synonymous with the brand Scantron the way tissues are with Kleenex, is so popular because it can provide rapid and straightforward results for millions of students.


BLUEX: A benchmark based on Brazilian Leading Universities Entrance eXams

arXiv.org Artificial Intelligence

One common trend in recent studies of language models (LMs) is the use of standardized tests for evaluation. However, despite being the fifth most spoken language worldwide, few such evaluations have been conducted in Portuguese. This is mainly due to the lack of high-quality datasets available to the community for carrying out evaluations in Portuguese. To address this gap, we introduce the Brazilian Leading Universities Entrance eXams (BLUEX), a dataset of entrance exams from the two leading universities in Brazil: UNICAMP and USP. The dataset includes annotated metadata for evaluating the performance of NLP models on a variety of subjects. Furthermore, BLUEX includes a collection of recently administered exams that are unlikely to be included in the training data of many popular LMs as of 2023. The dataset is also annotated to indicate the position of images in each question, providing a valuable resource for advancing the state-of-the-art in multimodal language understanding and reasoning. We describe the creation and characteristics of BLUEX and establish a benchmark through experiments with state-of-the-art LMs, demonstrating its potential for advancing the state-of-the-art in natural language understanding and reasoning in Portuguese.


Should We Pause AI?

#artificialintelligence

At a recent White House press conference, a Fox News correspondent asked the Biden administration's press secretary about AI safety researcher Eliezer Yudkowsky's highly publicized claim that if we don't pause or halt the development of artificial intelligence, then "literally everyone on earth will die." The question was met with some laughter from the White House press corps. But as someone with a technical background who covers AI and talks regularly to researchers, developers, and investors in the field, I saw nothing to chuckle at. Rather, I and other more optimistic AI watchers worry that overly dire warnings of imminent AI-driven destruction may cause us to pause or halt the development of a powerful technology with immense potential for improving our lives. Insiders hold a truly wide range of opinions on the best way to approach AI--from Yudkowsky's insistence that we immediately abandon all research in the area, to my own more moderate concern about large-scale industrial accidents arising from misuse of the technology, to an extreme optimism in some quarters about AI's potential to turn humanity into an immortal, star-spanning species.


Riiid Launches R.test, an AI-Powered SAT Preparation Platform

#artificialintelligence

Riiid, a leading provider of AI-powered education solutions and member company of Born2Global Centre, has announced the launch of R.test, an AI-powered Platform designed to help students prepare for standardized tests, including the digital SAT and ACT. With R.test, students can predict test scores with high accuracy in only a quarter of the time it takes to complete full mock tests. R.test's AI engine provides students with an overview of their current test-taking habits and predicts the correctness of answers for questions that they have not solved. The platform analyzes students' weaknesses and offers actionable guidance on how to improve, including an AI-curated selection of relevant practice questions. It also provides students with an analysis of their time spent answering questions, highlighting areas where they need to work more quickly.


Is This the Singularity for Standardized Tests?

The Atlantic - Technology

Last fall, when generative AI abruptly started turning out competent high-school- and college-level writing, some educators saw it as an opportunity. Perhaps it was time, at last, to dispose of the five-paragraph essay, among other bad teaching practices that have lingered for generations. Universities and colleges convened emergency town halls before winter terms began to discuss how large language models might reshape their work, for better and worse. But just as quickly, most of those efforts evaporated into the reality of normal life. Educators and administrators have so many problems to address even before AI enters the picture; the prospect of utterly redesigning writing education and assessment felt impossible.


A Method to Predict Semantic Relations on Artificial Intelligence Papers

arXiv.org Artificial Intelligence

Predicting the emergence of links in large evolving networks is a difficult task with many practical applications. Recently, the Science4cast competition has illustrated this challenge presenting a network of 64.000 AI concepts and asking the participants to predict which topics are going to be researched together in the future. In this paper, we present a solution to this problem based on a new family of deep learning approaches, namely Graph Neural Networks. The results of the challenge show that our solution is competitive even if we had to impose severe restrictions to obtain a computationally efficient and parsimonious model: ignoring the intrinsic dynamics of the graph and using only a small subset of the nodes surrounding a target link. Preliminary experiments presented in this paper suggest the model is learning two related, but different patterns: the absorption of a node by a sub-graph and union of more dense sub-graphs. The model seems to excel at recognizing the first type of pattern.


Eighth grader builds IBM Watson-powered AI chatbot for students making college plans

#artificialintelligence

While her peers reveled in an unprecedented virtual school year, the self-described "technology enthusiast," Harita Suresh, 13, was bored. She decided on an online course and settled on IBM Skills Network's "AI chatbots without programming." She lacked experience with artificial intelligence, but was eager to learn through the self-paced course. Harita is more than a little familiar with tech, "I have been interested in technology since I was 5," she said. "My first coding challenge was the Lightbot Hour of Code. I was fascinated that the code I wrote could control the actions of the characters on screen. Since then, I pursued coding on multiple platforms like code.org, The more I learned about tech, the more I wanted to know. In fifth grade, I took a Python programming course offered by Georgia Tech."


Eighth grader builds IBM Watson-powered AI chatbot for students making college plans

#artificialintelligence

While her peers reveled in an unprecedented virtual school year, the self-described "technology enthusiast," Harita Suresh, 13, was bored. She decided on an online course and settled on IBM Skills Network's "AI chatbots without programming." She lacked experience with artificial intelligence, but was eager to learn through the self-paced course. Harita is more than a little familiar with tech, "I have been interested in technology since I was 5," she said. "My first coding challenge was the Lightbot Hour of Code. I was fascinated that the code I wrote could control the actions of the characters on screen. Since then, I pursued coding on multiple platforms like code.org, The more I learned about tech, the more I wanted to know. In fifth grade, I took a Python programming course offered by Georgia Tech."


AI Stats News: 69% Of IT Executives Say They Cannot Respond To Cybersecurity Threats Without AI

#artificialintelligence

The recent surveys, studies, forecasts and other quantitative assessments of the health and progress of AI highlighted the role of AI in cybersecurity defense and in scoring standardized tests, the relationships between data migrating to the cloud and data modernization, lax security standards for IoT devices, and that the U.S. still leads the global race for AI domination but that China is making more rapid progress. Written essays on standardized tests in 18 states in the U.S. are currently graded by natural language processing (NLP) software ("automated essay scoring engines") with only a small percentage of students' essays--it varies between 5% to 20%--randomly selected for a human grader to double check the machine's work [VICE] Based on a 100-point scale, the U.S. leads the global AI race with 44.2 points, followed by China with 32.3 points and the European Union with 23.5 points; The U.S. came out on top in talent, research, development, and hardware, while China led in adoption and data. Airbus Americas has been using AI from AppZen to review expense reports and determine if they are in compliance with company policies; in the first partial year of the technology's implementation Airbus paid off its initial investment of $50,000 and pocketed about $50,000 more; the division expects to save $100,000 this year and at least $200,000 in 2020 in the Americas; worldwide implementation is expected to result in several millions of dollars in savings [The Wall Street Journal]